These files replicate the analysis from "Access to safe drinking water: Experimental evidence from new water sources in Bangladesh".

The directory "data" contains three files of data.  The datasets have been checked, cleaned, merged and anonymized from raw datafiles.  Distances (e.g., between households and water sources) are pre-calculated so that location data can be removed to preserve anonymity of study participants.  The three datasets provided are:

data_anonymized.dta: This is a household-level panel dataset in wide format.  Each row is a household in a study community or in one of the MICS primary sampling units.  Variables measured at followup are denoted fu*.  Variables for households in the MICs sample are obtained from publicly available data from the "Bangladesh Multiple Indicator Cluster Survey 2012-2013", accessible as of February 2021 at https://microdata.worldbank.org/index.php/catalog/2533.  

bac_entry_lags.dta: This dataset contains the test results for all fecal contamination tests conducted during the study and the dates samples were collected and the dates test results were entered. We use summary statistics from these data to correct for cases when tests were erroneously entered too soon or too long after collection. 

ws_panel_anonymized.dta: This dataset is a water-source level panel dataset in long format.  Each row is a water source tested either at baseline or follow-up.  At baseline, we tested all water sources in all study communities.  At followup, we tested only water sources that a sample household reported using.   

The directory "survey_instruments" contains the survey instruments used to collect data.  Note that the replication data files are not comprehensive, but only contain the variables necessary to replicate the analysis in this paper.  

The directory "dofiles" contains two dofiles needed to replicate the analyses from the main paper and the appendices:

replication_make_data.do: This dofile processes the data from the format provided into the format required for analysis, saving the datasets for analysis in the directory data/temp. Among other things, the code processes the fecal contamination data to account for differences in how they were processed and creates volume-weighted average variables for cases where households use multiple sources.   

replication_code.do: This dofile runs regressions and creates the tables and figures.  Note that Figures 1-6b are not created by the replication code, either becase they are hand-produced or because we are not providing the specific location data of the communities in the study in order to preserve their anonymity. 

We also provide two additional dofiles in the folder for reference.  

make_rbi_tvsc.do: This is the code we used to create the dummy variables for reshuffled treatment status (included in data_anonymized.dta as control_trvsco* and treated_trvsco*).  Note however that running the code does not exactly replicate the randomization draws in the datafile, probably becase we neglected to fix the Stata version we used to generate the draws. 

power_calcs_annotated.do: This is the code we used for power calculations.  Again, note that the results are slightly although not substantively different to those we report in the pre-analysis plan and appendices, either because we changed the cleaning protocols leading to slight changes in the data, or because we did not fix the Stata version, or both.  












